AITopics | information measure

Estimators for Multivariate Information Measures in General Probability Spaces

Neural Information Processing SystemsMar-17-2026, 00:36:59 GMT

Information theoretic quantities play an important role in various settings in machine learning, including causality testing, structure inference in graphical models, time-series problems, feature selection as well as in providing privacy guarantees. A key quantity of interest is the mutual information and generalizations thereof, including conditional mutual information, multivariate mutual information, total correlation and directed information. While the aforementioned information quantities are well defined in arbitrary probability spaces, existing estimators employ a $\Sigma H$ method, which can only work in purely discrete space or purely continuous case since entropy (or differential entropy) is well defined only in that regime. In this paper, we define a general graph divergence measure ($\mathbb{GDM}$), generalizing the aforementioned information measures and we construct a novel estimator via a coupling trick that directly estimates these multivariate information measures using the Radon-Nikodym derivative. These estimators are proven to be consistent in a general setting which includes several cases where the existing estimators fail, thus providing the only known estimators for the following settings: (1) the data has some discrete and some continuous valued components (2) some (or all) of the components themselves are discrete-continuous \textit{mixtures} (3) the data is real-valued but does not have a joint density on the entire space, rather is supported on a low-dimensional manifold. We show that our proposed estimators significantly outperform known estimators on synthetic and real datasets.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

SIMILAR: SubmodularInformationMeasuresBased ActiveLearningInRealisticScenarios

Neural Information Processing SystemsFeb-10-2026, 04:27:38 GMT

Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples.

artificial intelligence, learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

753fec797a22f71baf7106833734fdf3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:47:36 GMT

correlation, entropy, memorization, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

Self-Supervised Learning with an Information Maximization Criterion

Neural Information Processing SystemsDec-25-2025, 13:46:24 GMT

Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In this article, we argue that a straightforward application of information maximization among alternative latent representations of the same input naturally solves the collapse problem and achieves competitive empirical results. We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure that reflects the level of correlation among its arguments. Maximizing this correlative information measure between alternative representations of the same input serves two purposes: (1) it avoids the collapse problem by generating feature vectors with non-degenerate covariances; (2) it establishes relevance among alternative representations by increasing the linear dependence among them. An approximation of the proposed information maximization objective simplifies to a Euclidean distance-based objective function regularized by the log-determinant of the feature covariance matrix.

information maximization criterion, representation, self-supervised learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Estimators for Multivariate Information Measures in General Probability Spaces

Arman Rahimzamani, Himanshu Asnani, Pramod Viswanath, Sreeram Kannan

Neural Information Processing SystemsNov-20-2025, 19:54:16 GMT

A key quantity of interest is the mutual information and generalizations thereof, including conditional mutual information, multivariate mutual information, total correlation and directed information.

artificial intelligence, estimator, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Quality Over Quantity: Curating Contact-Based Robot Datasets Improves Learning

Sathyanarayan, Hrishikesh, Vantilborgh, Victor, Abraham, Ian

arXiv.org Artificial IntelligenceOct-22-2025

In this paper, we investigate the utility of datasets and whether more data or the 'right' data is advantageous for robot learning. In particular, we are interested on quantifying the utility of contact-based data as contact holds significant information for robot learning. Our approach derives a contact-aware objective function for learning object dynamics and shape from pose and contact data. We show that the contact-aware Fisher-information metric can be used to rank and curate contact-data based on how informative data is for learning. In addition, we find that selecting a reduced dataset based on this ranking improves the learning task while also making learning a deterministic process. Interestingly, our results show that more data is not necessarily advantageous, and rather, less but informative data can accelerate learning, especially depending on the contact interactions. Last, we show how our metric can be used to provide initial guidance on data curation for contact-based robot learning.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.18137

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.33)

Add feedback

A Semantic Generalization of Shannon's Information Theory and Applications

Lu, Chenguang

arXiv.org Artificial IntelligenceOct-21-2025

Does semantic communication require a semantic information theory parallel to Shannon's information theory, or can Shannon's work be generalized for semantic communication? This paper advocates for the latter and introduces a semantic generalization of Shannon's information theory (G theory for short). The core idea is to replace the distortion constraint with the semantic constraint, achieved by utilizing a set of truth functions as a semantic channel. These truth functions enable the expressions of semantic distortion, semantic information measures, and semantic information loss. Notably, the maximum semantic information criterion is equivalent to the maximum likelihood criterion and similar to the Regularized Least Squares criterion. This paper shows G theory's applications to daily and electronic semantic communication, machine learning, constraint control, Bayesian confirmation, portfolio theory, and information value. The improvements in machine learning methods involve multilabel learning and classification, maximum mutual information classification, mixture models, and solving latent variables. Furthermore, insights from statistical physics are discussed: Shannon information is similar to free energy; semantic information to free energy in local equilibrium systems; and information efficiency to the efficiency of free energy in performing work. The paper also proposes refining Friston's minimum free energy principle into the maximum information efficiency principle. Lastly, it compares G theory with other semantic information theories and discusses its limitation in representing the semantics of complex data.

information, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/e27050461

2510.15871

Country:

Asia (0.93)
Europe > United Kingdom > England (0.28)
North America > United States > Illinois (0.28)

Genre: Research Report (0.82)

Industry:

Banking & Finance (1.00)
Transportation > Ground > Road (0.92)
Transportation > Passenger (0.67)

Add feedback

753fec797a22f71baf7106833734fdf3-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 00:04:22 GMT

machine learning, memorization, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Conditional Regret-Capacity Theorem for Batch Universal Prediction

Bondaschi, Marco, Gastpar, Michael

arXiv.org Machine LearningAug-15-2025

--We derive a conditional version of the classical regret-capacity theorem. This result can be used in universal prediction to find lower bounds on the minimal batch regret, which is a recently introduced generalization of the average regret, when batches of training data are available to the predictor . As an example, we apply this result to the class of binary memoryless sources. Finally, we generalize the theorem to R enyi information measures, revealing a deep connection between the conditional R enyi divergence and the conditional Sibson's mutual information. Prediction of the continuation of a sequence from its own past is one of the central problems of statistics, science, and engineering.

artificial intelligence, machine learning, predictor, (17 more...)

arXiv.org Machine Learning

2508.10282

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes

Crutchfield, James P., Jurgens, Alexandra

arXiv.org Artificial IntelligenceAug-4-2025

We develop information theory for the temporal behavior of memoryful agents moving through complex -- structured, stochastic -- environments. We introduce and explore information processes -- stochastic processes produced by cognitive agents in real-time as they interact with and interpret incoming stimuli. We provide basic results on the ergodicity and semantics of the resulting time series of Shannon information measures that monitor an agent's adapting view of uncertainty and structural correlation in its environment.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.19275

Country: